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Abstract. Feasible tomography schemes for large particle numbers must possess, 
besides an appropriate data acquisition protocol, also an efficient way to reconstruct the 
density operator from the observed finite data set. Since state reconstruction typically 
requires the solution of a non-linear large-scale optimization problem, this is a major 
challenge in the design of scalable tomography schemes. Here we present an efficient 
state reconstruction scheme for permutationally invariant quantum state tomography. 
It works for all common state-of-the-art reconstruction principles, including, in 
particular, maximum likelihood and least squares methods, which are the preferred 
choices in today's experiments. This high efficiency is achieved by greatly reducing the 
dimensionality of the problem employing a particular representation of permutationally 
invariant states known from spin coupling combined with convex optimization, 
which has clear advantages regarding speed, control and accuracy in comparison to 
commonly employed numerical routines. First prototype implementations easily allow 
reconstruction of a state of 20 qubits in a few minutes on a standard computer. 



1. Introduction 

Full information about the experimental state of a quantum system is naturally highly 
desirable because it enables one to determine the mean value of each observable and thus 
also of each other property of the quantum state. Abstractly, such a complete description 
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is given for example by the density operator, a positive semidefinite matrix p with unit 
trace. Quantum state tomography pQ refers to the task to determine the density operator 
for a previously unknown quantum state by means of appropriate measurements. Via 
the respective outcomes more and more information about the true state generating the 
data is collected up to the point where this information uniquely specifies the particular 
state. Quantum state tomography has successfully been applied in many experiments 
using different physical systems, including trapped ions [2] or photons [3] as prominent 
instances. 

Unfortunately, tomography comes with a very high price due to the exponential 
scaling of the number of parameters required to describe composed quantum systems. 
For an N qubit system the total number of parameters of the associated density operator 
is 4 N — 1 and any standard tomography protocol is naturally designed to determine all 
these variables. The most common scheme used in experiments [I] consists of locally 
measuring in the basis of all different Pauli operators and requires an overall amount 
of 3^ different measurement settings with 2 N distinct outcomes each. Other schemes, 
e.g., using an informationally complete measurement [5] locally would require just one 
setting but, nevertheless, the statistics for 4^ different outcomes. Hence the important 
figure of merit to compare different methods is given by the combination of settings and 
outcomes. 

For such a scaling, the methods rapidly become intractable, already for present 
state-of-the-art experiments: Recording for example the data of 14 trapped ions [B], 
currently the record for quantum registers, would require about 150 days, although 100 
measurement outcomes can be collected for a single setting in only about three seconds. 
In photonic experiments this scaling is even worse because count rates are typically 
much lower, e.g., in recent eight-photon experiments [8] a coincidence of single click 
events occurs only at the order of minutes, hence it would require about seven years to 
collect an adequate data set. This directly shows that more sophisticated tomography 
techniques are mandatory. 

New tomography protocols equipped with better scaling behaviour exploit the idea 
that the measurement scheme is explicitly optimized only for particular kinds of states 
rather than for all possible ones. If the true unknown state lies within this designed 
target class then full information about the state can be obtained with much less 
effort, and if the underlying density operator is not a member then a certificate signals 
that tomography is impossible in this given case. Recent results along this direction 
include tomography schemes designed for states with a low rank [HI [El El , particularly 
for important low rank states like matrix product [12] or multi-scale entanglement 
renormalization ansatz states [13J. Other schemes include a tomography scheme based 
on information criteria [2] or — the topic of this manuscript — states with permutation 
invariance [15]. 

However it needs to be stressed that in any real experiment all these tomography 
schemes must cope with another, purely statistical challenge: Since only a finite 
number of measurements can be carried out in any experiment one cannot access 
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the true probabilities predicted by quantum mechanics pk = tr(p truc Mk), operator 
Mfc describing the measurements, but merely relative frequencies fa. Though these 
deviations might be small, the approximation fa rs pk causes severe problems in the 
actual state reconstruction process, i.e., the task to determine the density operator 
from the observed data. If one naively uses the frequencies according to Born's rule 
fa = tr(pi in Mfc) and solves for the unknown operator p\ in , then, apart from possible 
inconsistencies in the set of linear equations, the reconstructed operator p\ n ^ is often 
not a valid density operator anymore because some of its eigenvalues are negative. Hence 
in such cases this reconstruction called linear inversion delivers an unreasonable answer. 
It should be kept in mind that inconsistencies can also be due to systematic errors, 
e.g., if the true measurements are aligned wrongly relative to the respective operator 
representation [16j Eh but such effects are typically ignored. 

Therefore, statistical state reconstruction relies on other principles than linear 
inversion. In general, these methods require the solution of a non-linear optimization 
problem, which is much harder to solve than just a system of linear equations. For large 
system sizes this becomes, besides the exponential scaling of the number of settings and 
outcomes, a second major problem, again due to the exponential scaling of the number 
of parameters of the density operator. In fact for the current tomography record of eight 
ions in a trap [2], this actual reconstruction took even longer than the experiment itself 
(one week versus a couple of hours). Hence, feasible quantum state tomography schemes 
for large systems must, in addition to an efficient measurement procedure, possess also 
an efficient state reconstruction algorithm, otherwise they are not scalable. 

In this paper we develop a scalable reconstruction algorithm for the proposed 
permutationally invariant tomography scheme [15] . It works for common reconstruction 
principles, including, among others, maximum likelihood and least squares methods. 
This scheme becomes possible once more by taking advantage of the particular symmetry 
of this special kind of states, which provides an efficient and operational way to store, 
characterize and even process those states. This method enables a large dimension 
reduction in the underlying optimization problem such that it gets into the feasible 
regime. The final low dimensional optimization is performed via non-linear convex 
optimization which offers great advantages in contrast to commonly used numerical 
routines, in particular regarding numerical stability and accuracy. Already a first 
prototype implementation of this algorithm allows state reconstruction for 20 qubits 
in a few minutes on a standard desktop computer. 

The outline of the manuscript is as follows: Section [2] summarizes the background 
on permutationally invariant tomography and on statistical state reconstruction. The 
key method is explained in Sec. |3] and highlighted via examples in Sec. HJ that are 
generated by our current implementation. Section collects all the technical details: 
the mentioned toolbox, more notes about convex optimization, additional information 
about the pretest or certificate and the measurement optimization, both addressed for 
large qubit numbers. Finally, we conclude and provide an outlook on further directions 
in Sec. El 
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2. Background 

2.1. Permutationally invariant tomography 

Permutationally invariant tomography has been introduced as a scalable reconstruction 
protocol for multi-qubit systems in Ref. [15]. It is designed for all states of the system 
that remain invariant under all possible interchanges of its different particles. Abstractly 
such a permutationally invariant state ppi of N qubits can be expressed in the form, 



where V(p) is the unitary operator which permutes the N different subsystems according 
to the particular permutation p and the summation runs over all possible elements of 
the permutation group Sn- Many important states, like Greenberger-Horne-Zeilinger 
states or Dicke states fall within this special class. 

As shown in Ref. [15], full information of such states can be obtained by using in 
total ( N ^ 2 ) = (N 2 + 3iV + 2)/2 different local binary measurement settings, while for 
each setting only the count rates of (N + 1) different outcomes need to be registered. 
This finally leads to a cubic scaling in contrast to the exponential scaling of standard 
tomography schemes. 

The measurement strategy that attains this number runs as follows: Each setting is 
described by a unit vector a G R 3 which defines associated eigenstates \i) a of the trace- 
less operator a ■ a. Each party locally measures in this basis and registers the outcomes 
"0" or "1" respectively. The permutationally invariant part can be reconstructed from 
the collective outcomes, i.e., only the number of "0" or "1" results at the different 
parties matters but not the individual site information. The corresponding coarse- 
grained measurements are given by 



with k = 0, . . . , N, and where the summation p' is over all permutations that give distinct 
terms. In total one needs the above stated number of different settings a. These settings 
can be optimized in order to minimize the total variance which provides an advantage 
in contrast to random selection. 

In addition to this measurement strategy there is also a pretest which estimates 
the "closeness" of the true, unknown state with respect to all permutationally invariant 
states from just a few measurement results [15]. This provides a way to test in advance 
whether permutationally invariant tomography is a good method for the unknown state. 

Restricting to the permutationally invariant part of a density operator has already 
been discussed in the literature; for example for spins in a Stern-Gerlach experiment [T8] 
or in terms of the polarization density operator P3B EDI EE] • Here, due to the restricted 
class of possible measurements, only the permutationally invariant part of, in principle 
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distinguishable, particles is accessible [201 EI]- This is a strong conceptual difference to 
permutationally invariant tomography where one intentionally constrains itself to this 
invariant part. Nevertheless it should be emphasized that the employed techniques are 
similar. 

2.2. Statistical state reconstruction 

Since standard linear inversion of the observed data typically results in unreasonable 
estimates as explained in the introduction, one employs other principles for actual state 
reconstruction. In general one uses a certain fit function F(p) that penalizes deviations 
between the observed frequencies fk and the true probabilities predicted by quantum 
mechanics Pk{p) = tr(pMfe) if the state of the system is p. The reconstructed density 
operator p is then given by the (often unique) state that minimizes this fit function, 

p = arg minF(p), (4) 

p>0 

hence the reconstructed state is precisely the one which best fits the observed data. Since 
the optimization explicitly restricts to physical density operators this assures validity of 
the final estimate p in contrast to linear inversion. Depending on the functional form of 
the fit function different reconstruction principles are distinguished. A list of the most 
common choices is provided in Tab. 12.21 



Reconstruction principle 


Fit function F(p) 


Maximum Likelihood [22] 


-Efe/fe lo g[Pfc(p)] 


Least Squares [23] 


Y.k w k[fk -Pfc(p)] 2 , w k > 


Free Least Squares [I] 


Ek 1 /Pk(p)[fk-Pk{p)} 2 


Hedged Maximum Likelihood [2J 


-E k fklog\p k (p)} -/31og[det(p)], > 



Table 1. Common reconstruction principles and their corresponding fit functions F 
used in the optimization given by Eq. (j4]); see text for further details. 



The presumably best-known and most often employed method is called maximum 
likelihood principle [22] . Given a set of measured frequencies fk the maximum likelihood 
state p m \ is exactly the one with the highest probability to generate these data. Other 
common fit functions, usually employed in photonic state reconstruction, are different 
variants of least squares [UE2] that originate from the likelihood function using Gaussian 
approximations for the probabilities. There this is often also called maximum likelihood 
principle but we distinguish these, indeed different, functions here explicitly. Typically 
the weights in the least squares function are set to be Wk = 1/ fk because fk represents an 
estimate of the variance in a multinomial distribution, cf. the free least squares principle. 
However, this leads to a strong bias if the counts rates are extremal, e.g., if one of the 
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outcomes is never observed this method naturally leads to difficulties. A method to 
circumvent this is given by the free least squares function [3] or using improved error 
analysis for rare events. Let us stress that all these principles have the property that if 
linear inversion delivers a valid estimate pi; n > 0, it is also the estimate given by these 
reconstruction principles 0- 

Finally, hedged maximum likelihood [23] represents a method that circumvents low 
rank state estimates. Via this one obtains more reasonable error bars using parametric 
bootstrapping methods [25] ; for other error estimates we refer to the recently introduced 
confidence regions for quantum states [2~S1|2T|. In principle many more fit functions are 
possible, like generic loss functions [28], but considering these is out of the scope of this 
work. 

3. Method 

From the previous it is apparent that permutationally invariant state reconstruction 
requires the solution of 



for the preferred fit function. This large-scale optimization becomes feasible along the 
following lines: 

First, one reduces the dimensionality of the underlying optimization problem 
because one cannot work with full density operators anymore. This requires an 
operational way to characterize permutationally invariant states ppi > over which 
the optimization is performed, and additionally demands an efficient way to compute 
probabilities pk(ppi) which appear in the fit function. Second, one needs a method to 
perform the final optimization. We employ convex optimization for this task. 

3.1. Reduction of the dimensionality 

This reduction relies on an efficient toolbox to handle permutationally invariant states, 
which exploits the particular symmetry. Here we explain this method and the final 
structure; for more details see Sec. I5.1[ These techniques are well-proven and established; 
we employ and adapt them here for the permutationally invariant tomography scheme 
such that we finally reach state reconstruction of larger qubits. 

The methods of this toolbox are obtained via the concept of spin coupling that 
describes how individual, distinguishable spins couple to a combined system if they 
become indistinguishable. Since we deal with qubits we only need to focus on spin- 
1/2 particles. In the simplest case, two spin-1/2 particles can couple to a spin-1 
system, called the triplet, if both spins are aligned symmetric, or to a spin-0 state, 
the singlet, if the spins are aligned anti-symmetric. Abstractly, this can be denoted as 

\ For the least squares fit functions this follows because F = in this case and clearly F > for 
those functions. In the case of the likelihood it follows from positivity of the classical relative entropy 
between probability distributions. 




Ppi>0 
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C 2 © C 2 = C 3 © C 1 . If one considers now three spins, then of course all spins can point 
in the same direction giving a total spin-3/2 system. There is also to a spin-1/2 system 
possible if two particles form already a spin-0 and the remaining one stays untouched. 
This can be achieved however by more than one possibility, in fact by two inequivalent 
choices @, and is expressed by C 2 © C 2 © C 2 = C 4 © (C 2 © C 2 ). 

This scheme can be extended to N spin-1/2 particles to obtain the following 
decomposition of the total Hilbert space, 

N/2 

U = (<C 2 )® N = Hj © JCj. (6) 

J— J mi ii 

where the summation runs over different total spin numbers j = jmin, jmin + 1, • • • , N/2 
starting from j min G {0, 1/2} depending on whether N is even or odd. Here, %j are 
called the spin Hilbert spaces with dimensions dim = 2j + 1, while ICj are referred 
as multiplicative spaces that account for the different possibilities to obtain a spin-j 
state. They are generally of a much larger dimension, cf. Eq. f[2"2"j) . 

Permutationally invariant states have a simpler form on this Hilbert space 
decomposition, namely 

N/2 1 

PPI = Wi ® x^-y (7) 

3 — Jmin 

with density operators pj called spin states and according probabilities pj. Thus 
a permutationally invariant density operator only contains non-trivial parts on the 
spin Hilbert spaces while carrying no information on the multiplicative spaces. Note 
further that there are no coherences between different spin states. This means 
that any permutationally invariant state can be parsed into a block structure as 
schematically depicted in Fig. [TJ The main diagonal is built up by unnormalized 
spin states pj = PjPj/ dim(/C,), which appear several times, the number being equal 
to the dimension of the corresponding multiplicative space. This block-decomposition 
represents a natural way to treat permutationally invariant states and has for example 
been employed already in the aforementioned related works of permutationally invariant 
tomography $T8\ US EH EE] but also in other contexts [291130]. 

This structure shows that if we work with permutationally invariant states we do 
not need to consider the full density operator but rather that it is sufficient to deal only 
with this ensemble of spin states. Therefore we identify from now on 

PPI PjPj, j = jmin, jmin + 1, • • • , N/2. (8) 

This provides already an efficient way to store and to visualize such states. More 
importantly, it also enables an operational way to characterize valid states, since any 

§ More precisely, all states of the form \ifj) = V(p) |0) with p being any possible permutation, are 

states of total spin j = 1/2 and projection m = 1/2 to the collective spin operators J,; = Yl n =i <T i;n/2, 
<7i-„ being the corresponding Pauli operator on qubit n. However as can be checked these states only 
span a 2-dimensional subspace. 
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Figure 1. Block decomposition for a generic permutationally invariant state as given 
by Eq. ([7]) with pj = PjPj/ dim(/Cj). 



permutationally invariant operator ppi represents a true state if and only if all these spin 
operators pj are density operators and pj a probability distribution. This is in contrast 
to the generalized Bloch vector employed in the original proposal of permutationally 
invariant tomography [15] given by Eq. ( 152]) . which also provides efficient storage and 
processing of permutationally invariant states, but where Bloch vectors of physical states 
are not as straightforward to characterize. 

Via this identification one can demonstrate once more the origin of the cubic 
scaling of the permutationally invariant tomography scheme. The largest spin state 
is of dimension N+l which requires parameters on the order of N 2 for characterization. 
Since one has on the order of N of these states this shows results in a cubic scaling. 

Fixing the ensemble of spin states as parametrization it is now required to obtain an 
efficient procedure to compute the probabilities p%{ppi) for the optimized measurement 
scheme. This is achieved as follows: First let us stress that a similar block decomposition 
as given by Eq. (j7|) holds for all permutationally invariant operators. Hence also the 
measurements M£ given by Eq. (J2]) can be cast into this form. Using the convention 

N/2 

M a k = M a K3 ® 1 (9) 

3 J mi ii 

leads to 

N/2 

tr(p P1 M a k )= PMPiMii)- (10) 

3 Jmin 

Therefore the problem is shifted to the computation of the spin-j operators M% • for 
each setting a. As we show in Prop. !5.2l below. using the idea that measurements can be 
transformed into each other by a local operation U a \i) = \i) a this provides the relation 

Mi, = W«M%wf. (11) 
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Here M^ 3 - corresponds to the measurement in the standard basis (that need to be 



where Sij stands for the spin operators in dimension 2j + 1. This finally provides the 
efficient way to compute probabilities. 

3.2. Optimization 

As a second step one still needs to cope with the optimization itself. Although 
there are different numerical routines for statistical state reconstruction like maximum 
likelihood [31] or least squares (U [32], we prefer non-linear convex optimization [33J 
to obtain the final solution. Quantum state reconstruction problems are known to be 
convex [321 131] , but convex optimization has hardly been used for this task. However, 
convex optimization possesses several advantages: First of all it is a systematic approach 
that works for any convex fit function, including maximum likelihood and least squares. 
In contrast to other algorithms such as the fixed-point algorithm proposed in Ref. [3TJ 
it gives a precise stopping condition via an appropriate error control (see, however 
Ref. [3S]) and still exploits all the favourable, convex, structure in comparison to re- 
parametrization ideas as in Ref. [I] . Moreover it is guaranteed to find the global optimum 
and the obtained accuracy is typically much higher than with other methods. 

Quantum state reconstruction as defined via Eq. (JJJ) can be formulated as a 
convex optimization problem as follows: All fit functions listed in Tab. 12.21 are convex 
on the set of states. Via a linear parametrization of the density operator p(x) = 
1/ dim('H) + J2 x iBi, using an appropriate operator basis Bi such that normalization is 
fulfilled directly, the required optimization problem becomes 



with a convex objective function F(x) = F[p(x)} and a linear matrix inequality as 
constraint, i.e., precisely the structure of a non-linear convex optimization problem [33J. 
For permutationally invariant states one uses p(%) = ®jPj(x) with pj = pjpj by using 
an appropriate block-diagonal operator basis Bf, therefore we continue this discussion 
with the more general form. 

The optimization given by Eq. ( fl3|) can be performed for instance with the help of 
a barrier function [33] Ijjl Rather than considering the constrained problem one solves 

j| Let us stress that both least squares options can be parsed into a simpler convex problem, called 
semidefinite program, as for instance shown in Ref. [32j [23] , but that this does not work with the true 
maximum likelihood function to our best knowledge. 




min F[p(x)} 



(13) 
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the unconstrained convex task given by 

min F[p(x)] — t logfdet p(x)} 



(14) 



X 



where the constraint is now directly included in the objective function. This so-called 
barrier term acts precisely as its name suggests: If one tries to leave the strictly feasible 
set, i.e., all parameters x that satisfy p(x) > 0, one always reaches a point where 
at least one of the eigenvalues vanishes. Since the barrier term is large within this 
neighbourhood, in fact singular at the crossing, it penalizes points close to the border 
and thus ensures that one searches for an optimum well inside the region where the 
constraint is satisfied. The penalty parameter t > plays the role of a scaling factor. 
If it becomes very small the effect of the barrier term becomes negligible within the 
strictly feasible set and only remains at the border. Therefore a solution of Eq. (|T4|) 
with a very small value of t provides an excellent approximation to the real solution. 
As shown in Sec. 15.21 this statement can be made more precise by 



which follows from convexity and which relates the true solution x so i of the original 
problem given by Eq. ffT3l) to the solution x* ol of the unconstrained problem with 
penalty parameter t. This condition represents the above mentioned error control 
and serves as a stopping condition, i.e., as a quantitative error bound for a given t. 
Note that for permutationally invariant tomography dim('H) is not the dimension of 
the true iV-qubit Hilbert space but instead the dimension of p(x) = @jpj(x), i.e., 
. (2j + 1) = (N + l)(N + 2j min + l)/4 which increases only quadratically. 

Small penalty parameters are approached by an iterative process: For a given 
starting point x" tart and a certain value of the parameter t n > one solves Eq. (TT4")) . Its 
solution will be the starting point for x^} t = x™ ol for the next unconstrained optimization 
with a lower penalty parameter t n+ \ < t n . As starting point for the first iteration we 
employ to = 1 and the point x® tart corresponding to the totally mixed state. This 
procedure is repeated until one has reached small enough penalty parameters. The 
penalty parameter is decreased step- wise. Then each unconstrained problem can be 
solved very efficiently since one starts already quite close to the true solution. 

Let us point out that via the above mentioned barrier method one additionally 
obtains solutions to the hedged state reconstruction with (3 = t since the unconstrained 
problem given by Eq. (fT4"|) is precisely the fit function of the hedged version of Tab. 12.21 

Finally for comparative purposes we also like to mention the iterative fixed- 
point algorithm of Ref. [35]; for a modification see Ref. [37]. It is designed for 
maximum likelihood estimation and is straightforward to implement since it only 
requires matrix multiplication, however, it has deficits regarding control and accuracy. 
For permutationally invariant tomography the algorithm can be adapted as follows: 
Given a valid iterate pp T characterized by the ensemble of spin states p™ = PjPj one 
evaluates the probabilities pl(ppi) using Eq. (TTU|) . Next, one computes the operators 



F[p(xt ol )]-F[p(x sol )}<tdhn(U) 



(15) 




fi 



•a 



(16) 
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which determine the next iterate p™ +1 = R^RjjM with M = ^j^i^PjRj)- 
This iteration is started for example from the totally mixed state and repeated until a 
sufficiently good solution is obtained. 

4. Examples 

The two methods from the previous section are employed in a prototype implementation 
under MATLAB, which already enables state reconstruction of about 20 qubits on a 
standard desktop computer. 

The current algorithm is tested along the following lines: For a randomly generated 
permutationally invariant state ppf c we compute the true probabilities p% true for the 
chosen measurement settings. Rather than sampling we set the observed frequencies 
equal to this distribution, i.e., f% = pltme- m this way linear inversion would return the 
original state, hence also each other reconstruction principle from Tab. I2.2l has this state 
as solution. We now start the algorithm and compare the trace distance between the 
analytic solution pp™ 6 and the state after n iterations pgj. This distance \ tr \ppf c — ppi\ 
quantifies the probability with which the two states, the true analytic solution and the 
iterate after n steps in the algorithm, could be distinguished [38] . 





Algorithm test [12 Qubits] 
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Figure 2. Trace distance between the analytic solution and the estimate after n 
algorithm steps for the three most common reconstruction principles from Tab. I2.2I 



A typical representative of this process is depicted in Fig. |2] for 12 qubits using 
optimized settings. The randomly generated state pp™ was chosen to lie at the boundary 
of the state space since such rank-reduced solutions better resemble the case of state 
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reconstruction of real data. More precisely, each spin state of the true density operator 
is given by a pure state pj nc = \ipj) chosen according to the Haar measure, while 
the pj are selected by the symmetric Dirichlet distribution with concentration measure 
a = 1/2 [32] • As apparent the algorithm behaves similar for all three reconstruction 
principles and rapidly obtains a good solution after about 70 iterations. The steps in 
this plot are points where the penalty parameter is reduced by a factor of 10 starting 
from t — 1 and decreased down to t = 10~ 10 . The slight rise after these points comes 
from the fact that we plot the trace distance and not the actual function (fit-function 
plus penalty term) that is minimized. 

Maximum Likelihood: Convex optimization vs. iterative fixed-point algorithm [20 Qubits] 

1 1 1 1 1 1 

Convex Optimization 




Iterative fixed-point algorithm 



200 400 600 800 1000 1200 1400 

Time [sec] 

Figure 3. Comparison of the maximum likelihood principle between convex 
optimization and the iterative fixed-point algorithm for the described testing procedure 
with respect to accuracy and algorithm time. 

Figure [3] shows a similar comparison for the maximum likelihood reconstruction 
of 20 qubits but now plotted versus algorithm time 0- For comparison we include the 
performance of the iterative fixed-point algorithm, which requires much more iterations 
in general (3000 in this case vs. about 90 for convex optimization). Let us emphasise that 
a similar behaviour between these two algorithms appears also for smaller qubit numbers. 
As one can see, convex optimization delivers a faster and in particular more accurate 
solution. In contrast, the iterative fixed-point algorithm shows a bad convergence rate 
although it initially starts off better. This was one of the main reasons for us to switch 
to convex optimization. 

The current algorithm times of this test are listed in Tab. [2]which are averaged over 

t All simulations were performed on an Intel Core i5-650, 3.2 Ghz, 8 GB RAM using MATLAB 7.12. 
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N 


= 8 


N = 


= 12 


A r = 16 


N -- 


= 20 


Maximum Likelihood 


Algorithm test 


8.5 


sec 


17 


sec 


2.7 min 


9.2 


min 


Simulated experiment 


9.2 


sec 


18 


sec 


2.9 min 


9.3 


min 


Least Squares 


Algorithm test 


8.4 


sec 


39 


sec 


2.5 min 


6 min 


Simulated experiment 


9.2 


sec 


43 


sec 


2.7 min 


6.7 


min 



Table 2. Current performance of the convex optimization algorithm on the described 
test procedure and on frequencies from simulated experiments; free least squares 
provides similar results to the maximum likelihood principle. 



50 randomly generated states. Thus already this prototype implementation enables state 
reconstruction of larger qubit numbers in moderate times. The small time difference 
between reconstruction principles is because least squares as a quadratic fit function 
provides some advantages in the implementation. More details about this difference are 
given in Sec. [5j 

Table [2] also contains the algorithm times for reconstructions using simulated 
frequencies = n%/N T . For each setting they are deduced from the count rates n%. 
sampled from a multinomial distribution using the true event distribution pi true and 
N T = 1000 repetitions. The true probabilities correspond to the same states as already 
employed in the algorithm test. From the table one observes that state reconstruction 
for data with count statistics requires only slightly more time than the algorithm test 
with the correct probabilities. We attribute this to the fact that a few more iterations 
are typically required in order to achieve the desired accuracy. 

Finally let us perform the reconstruction of a simulated experiment of N = 14 
qubits. Suppose that one intends to create a Dicke state \D k:N ) as given by Eq. (12"7|) . 
but that the preparation suffers from some imperfections such that at best one can 
prepare states of the form 

Pdicke = [ k ) pk{1 ~ lDk ' N) {Dk ' Nl ' 

where p = 0.6 characterizes some asymmetry. As the true state prepared in the 
experiment we now model some further imperfection in the form of an additional 
small misalignment U® N , with U = exp(—i9a y /2), 9 = 0.2, and some permutationally 
invariant noise crpi (chosen via the aforementioned method but using Hilbert-Schmidt 
instead of the Haar measure), i.e., 

p true = 0.6 U m Pdickc U^ N + 0.4 a Pl . (18) 
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The frequencies are obtained via sampling from the state given by Eq. (I18j) using 
intentionally only N r = 200 repetitions per setting (to see some differences). Finally, we 
reconstruct the state according to the maximum likelihood principle. Figure H] shows 
the tomography bar plots of one of these examples for the largest spin state PjPj, 
j = N/2 = 7 for both states. Though this state might be artificial this example should 
highlight once more that this state reconstruction algorithm works also for realistic data 
and for qubit sizes where clearly any non-tailored state reconstruction scheme would 
fail. Moreover, it demonstrates that the spin ensemble pjpj represents a very convenient 
graphical representation of such states (compared to a 2 14 x 2 14 matrix in this case). 



True spin state Maximum likelihood spin state 




Figure 4. The real part of the true and reconstructed (according to maximum 
likelihood) largest spin ensemble PjPj, j — N/2 = 7 using the optimal measurement 
setting. The basis is given by the Dickc basis l-Dfc.14), cf. Eq. (1271) . 



5. Details 

5.1. Reduction 

Let us first give more details regarding the reduction step. This starts by recalling a 
group theoretic result summarized in the next section, which is then used to show how 
the stated simplifications with respect to states and measurements are obtained. 

5.1.1. Background Consider the following two unitary representations defined on the N 
qubit Hilbert space: The permutation or symmetric group V(p) which is defined by their 
action onto a standard tensor product basis by V(p) . . . , i^) = Iv-^i), • • • , V" 1 ^)) 
according to the given permutation p, and the tensor product representation W(U) = 
jj®n Q f S p ec i a i unitary group. A result known as the Schur-Weyl duality f4Dl HI] 
states that the action of these two groups is dual, which means that the total Hilbert 
space can be divided into blocks on which the two representations commute. More 



Permutationally invariant state reconstruction 



15 



precisely one has 

N/2 

(€ 2 f N = Hi®** (19) 

J Jmin 

N/2 

V(p) = l®^(p), (20) 

3 Jmin 

AT/2 

H/(C/) = Wj(U)®t. (21) 

J ~ J min 

Here Vj and are respective irreducible representations, and j m i n € {0, 1/2} depending 
on whether N is even or odd. The dimensions of the appearing Hilbert spaces are 
dim(T^j) —2j + l and 

for all j < N/2 and dim (K.N/2) = 1- Let us note that Eq. (120]) already ensures the 
block- diagonal structure of permutationally invariant operators, while Eq. (121)) becomes 
important for the measurement computation. 

A basis of the Hilbert space %j <g> Kj is formed by the spin states \j, m, a) = 
\j,m) <g) |cKj) with m = —j,...,j and atj = 1, . . . , dim(/Cj). These are obtained by 
starting with the states having the largest spin number m = j, which are given by 

\3J,1) = \0} m ®\rf N - 2 \ (23) 
\j,j,a) = J2^ P V(p)\j,j,l), (24) 

v 

for all a > 2. The coefficients Cj jP must ensure that the states \j,j,a) are orthogonal, 
otherwise their choice is completely free since the detailed structure of different cc's is 
not important. The full basis is obtained by subsequently applying the ladder operator 
J_ = J2n=i a -;n t° decrease the spin number m. Here <r_ ;n refers to the operator with 
C- = (o~ x — iOyj/2 on the n-th qubit and identity on the rest. Thus in total the basis 
becomes 

\j,m,a)=NJt m \j,j,a), (25) 

with appropriate normalizations A/". Note that the subspace corresponding to the highest 
spin number j = N/2 is also called the symmetric subspace, which contains many 
important states like Greenberger-Horne-Zeilinger or Dicke states, which using the spin 
states read as 

|GHZ) = -L (|o)«* + = -^(|JV/2, N/2, 1) + \N/2, -N/2, 1)), (26) 

\D kN ) =AT\\lf k ® \0f N - k ] =\N/2,N/2-k,l) . (27) 

J pi 
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5.1.2. Permutationally invariant states and measurement operators Let us now employ 
this result in order to derive a generic form for permutationally invariant states; we give 
the proof for completeness. 

Proposition 5.1 (Permutationally invariant states). Any permutationally invariant 
state of N qubits p-pi defined via Eq. (TJP can be written as 

N/2 



pp. = »« ® ' (28) 

./ ,7 m in 

hence it is fully characterized already by the ensemble PjPj. Moreover p-pi is a density 
operator if and only if all pj are density operators and pj a probability distribution. 

Proof. The proposition follows using the representation V(p) given by Eq. (120]) in 
the definition of the states Eq. (JTJ and then applying Schur's lemma jlQj HI]. This 
lemma states that any linear operator A from JCj to /Q which commutes with all 
elements p of the group Vi(p)A = AVj(p) must either be zero if % and j label 
different irreducible representations or A = cl if they are unitarily equivalent. Since 
Api — 1/N\ J2 P Vi(v)AVj{j)y fulfils this relation one obtains 

-^E*)**)'-^^^. (») 

The normalization can be checked taking the trace on both sides. Adding appropriate 
identities provides 

i J2 1 ® V *(P) B1 ® V M = % ^A B ) ® dirrr^) (30) 

where B should now be a linear operator from TLj (g) JCj to Hi (g> /Q. 

Finally let Pj denote the projector onto 7^- ® /Cj and using Eq. (130]) delivers 

^(p)pV» f = E a?, E Py{p)P^P 3 V{p) ] P r (31) 

= E 1 4r E p ^ ® ^(p)]^ppi^[i ® ^(p)]*^ 1 (32) 



JV! 

»j v. p 



1 



dim(/C 



F, (33) 



which provides the general structure. 

The state characterization part follows because positivity of a block-matrix is 
equivalent to positivity of each block. □ 

Next let us concentrate on the measurement part. Though the block decomposition 
follows already from the previous proposition, it is here more important to obtain an 
efficient computation of each measurement block for the selected setting. 
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Proposition 5.2 (Measurement operators). The POVM elements M£ as defined in 
Eq. for any local setting o6R 3 can be decomposed as M% = ®^ <S> 1 

= WAUjMftWiWy . (35) 

TTie unitary is given by Wj(U a ) = exp(— % ^ z USij) using the spin operators Sij in 
dimension 2j + 1, while the coefficients t\ are determined by U a = exp(— i ^ tio~i/2) 
which satisfies a ■ a — U a a z U^. For the measurement in the standard basis a = e% one 
gets 

= \j, N/2 -k){j, N/2 -k\ (36) 
if —j < N/2 — k < j and zero otherwise. 

Proof. The basic idea is to consider the measurement in an arbitrary local basis a by 
a rotation followed by the collective measurement in the standard basis. The block 
decomposition is obtained as follows 

Ml = U® N Mt*ui® N = W{U a )[Q)MZ ® l] W(U a y (37) 

j 

= (Bw,(U a )M e k ^(U a y ® 1. (38) 

j 

The first step holds because U a satisfies |i) a (i| = U a \i) (i\ U^, while the block 
decomposition of the standard basis measurement M^ 3 is employed afterwards. In the 
last part one uses the tensor product representation given by Eq. (T2TT) . 

Since one knows that Wj is irreducible it can be uniquely written in terms of its 
Lie algebra representation dWj as Wj(U a ) = Wj(e~ lX ) = e~ tdWj ^ x \ which is given by 
the spin operators in this case, i.e., dWj(o~i/2) = Sij [12] . 

Thus it is left to compute the measurement blocks M^- for the standard basis. Note 
it is sufficient to evaluate Mkj <8> such that one can employ the spin basis states 

\j, m, 1) as introduced in Sec. 15. 1.11 At first note that M^- exactly contains k projections 
onto |0), while each basis state \j, m, 1) possesses (N/2+m) zeros. Therefore one obtains 
Ml 3 - \ j,m, 1) oc 5k,N/2+m \ji m i !)• Since each POVM has to resolve the identity this is 
only possible if each M^ 3 - is the stated rank-1 projector on the basis states. □ 

Finally one still needs to express U a = exp(— i J2i h°~i/ c £) for the chosen setting 
a G R 3 . Since this can be related to a familiar rotation |32] these coefficients can be 
expressed as ti = (9h)i via a rotation about an angle 9 around the axis n. Since this 
rotation should turn £3 into a its parameters are given by 

~ 63 x a 

n = — — — , (39) 

||e 3 x a\\ 2 

9 = arccos(e3 ■ a), (40) 
and h = ei (or any other orthogonal vector) if a = ±63. 
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5.2. Convex optimization 

In this part we collect some more details regarding the described convex optimization 
algorithm; for a complete coverage we refer to the book [33J. 

Each unconstrained optimization given by Eq. ( |14p is solved via a damped Newton 
algorithm. The minimization of f(x) = F[p(x)] — t logfdet p(x)] is obtained by an 
iterative process. In order to determine a search direction at a given iterate x n one 
minimizes the quadratic approximation 

f(x n + Ax) « f{x n ) + Vf(x n fAx + ^Ax T V 2 f(x n )Ax. (41) 

This reduces to solving a linear set of equation called the Newton equation 

V 2 f(x n )Ax nt = -Vf{x n ), (42) 

which determines the search direction Ax nt . The step length s for the next iterate 
x n+l = x n + sAx nt is chosen by a backtracking line search. Here one picks the largest 
s = maxfcgM (3 k with (3 G (0, 1) such that the iterate stays feasible p(x n+1 ) > and that 
the function value decreases sufficiently, i.e., f(x n+1 ) < f(x n ) + asV f(x n ) T Ax nt with 
a G (0,0.5). The process is stopped if one has reached an appropriate solution, which 
can be identified by ||V/(x n )||2 < e. If the initial point x start is already sufficiently close 
to the true solution then the whole algorithm converges quadratically, i.e., the precision 
gets doubled at each step. 

At this point let us give the gradient and Hessian of the appearing functions. 
For the barrier term if)(x) = — logfdet p{x)] restricted to the positive domain p(x) = 
1/ dim(%) + %iBi > one gets [33] 



dxi 
d 2 ijj(x) 



-tr^x)- 1 ^], (43) 

_ trKx)- 1 ^)- 1 ^]. (44) 

Equation f|44j) shows that the Hessian of the penalty term V 2 ^(x) > is positive definite, 
such that ip{x) is indeed convex. The derivatives of the preferred fit function can be 
computed directly. For instance using the likelihood function F m \(x) = — J2k fk l°gPfc(^) 
with Pk(x) = tr[p(x)Mfc] they read 

^ = -E^w). («) 

^=E p "|)W)tr( B( M t ). (46) 

The bottleneck of such an algorithm is the actual computation of the second 
derivatives. Although the expansion coefficients of each measurement tr(BjMk) can be 
computed in advance, it is still necessary to compute Eq. (|4T)|) anew at each point x due to 
the dependence of Pk{x). With respect to that the least squares fit function bears a great 
advantage since its Hessian is constant, i.e., djdiF\ s (x) = 2 J2k w k tr(-Bj-Mfc) tr(5jM fc ), 
such that one saves time on this part. 
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At last let us comment on the optimality conditions, known as the Karush-Kuhn- 
Tucker conditions [SB]- Agiven x* is the global solution of the convex problem given 
by Eq. ([13]) if and only if □ there exists an additional Lagrange multiplier Z* such that 



the pair (x*, A*) satisfies 

-£-F(x*)-tr[A*B i ]=0, Vi, (47) 

A* > 0, p(x*) > 0, (48) 

tr[A*p(x*)] = 0. (49) 



Given the solution x* ol of the corresponding unconstrained problem with penalty 
parameter t it follows from V/(a;* ol ) = using Eq. (I43p that the gradient conditions 
are satisfied with A t = tp(x t sol )~ 1 being the Lagrange multiplier. This pair (a;* ol , A t ) also 
satisfies Eq. (14"8"]) : only the duality gap condition tr[Atp(Xg 0l )] = tdim('H) > does not 
hold exactly. However this quantity appears in the following inequality 

F(xl ol ) - tr[A tP (^ ol )] = min /(x) - tr[A t p(x)] (50) 

x:p(x)>U 

< min F(x) = F{x sol ). (51) 

x:p(x)>0 

Here one used that a;* ol is the solution of F(x) — tr[A t p(x)] because the gradient vanishes 
(and the solution is not at the border), and tr[A 4 p(x)] > for the inequality. This is 
the stated accuracy given by Eq. ([151) which relates the function value of x* ol to the true 
solution x so i. 

5.3. Additional tools 

5.3.1. Optimization of measurement settings Measurement settings, each described 
by a unit vector hi G R 3 as explained in Sec. 12. 1\ are chosen to optimize a figure of 
merit characterizing how well a given permutationally invariant target state p tar can be 
reconstructed. This is motivated as follows: A permutationally invariant state ppi is 
uniquely described by its generalized Bloch vector [JS] defined as 

bkimn = tr ( [af ® af <g> af m ® 1®"] pi p PI ) (52) 

with natural numbers satisfying k + l + m + n = N. Consequently, one possible figure 
of merit is given by the total error of all Bloch vector elements, i.e., more precisely by 




Here the multinomial coefficient weights the error of each Bloch vector by its number 
of appearance in a generic Pauli product decomposition. 

The error of each Bloch vector element must now be related to the performed 
measurements. For that note that each Bloch vector element can be expressed as 

b kl mn = C " lmn tr (t( a * • ® 10n ]piPtar) (54) 

i 

+ Sufficiency holds under the Slater regularity condition that demands a strictly feasible point p(x) > 0, 
which naturally holds for state reconstruction problems. 
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using appropriate coefficients c^ lmn and the expectation values of [(a* • a)® N ~ n <g> l® n ]pi 
which can be computed from the coarse-grained measurement outcomes M^ 1 of setting 
ai as given by Eq. (J2J) using linear combinations. Assuming independent errors one gets 

*«) = £ ([(^ • ^)^"" ® ^Ipi) • (55) 

i 

The detailed form of the error expression £ 2 tar ([(dj ■ <j^® N ~ n (g) J® n ] PI ) may depend on 
the actual physical realization, but we assume the following form 

C ([(a* ■ ® i^Ipi) = * ([(a* • ® i^Ipi) . (56) 

where A p [A] = ti{pA 2 ) — [tr(pA)] 2 is the standard variance and K a state-independent 
factor. This form fits for example well to the common error model in photonic 
experiments where count rates are assumed to follow a Poissonian distribution. For 
more details on this derivation we refer to Ref. |15j . 

For large qubit numbers N this optimization is carried out iteratively. Starting 
from randomly chosen measurement directions or from vectors which are uniformly 
distributed according to some frame potential [13], one searches for updates according 
to 

, = pa? + (l-p)ri 

i Wva? + (1 - p)fi\\' 1 ' 



i 

Here d™ is the current iterate, p < 1 a probability close to 1 and fj are randomly 
chosen unit vectors. If this new set of directions a\ leads to a smaller total error 
^totai(^i) Ptax) than the previous set then this new measurement settings form the next 
iterate d™ +1 = d™, otherwise this procedure is carried out once more. This process is 
repeated until the total error does not decrease any more. This way of optimizing the 
measurements requires a method to compute the total error £ t 2 otal (dj, p t ar) for a given 
set of measurements dj. Using the generalized Bloch vector or the spin ensemble this 
computation can be carried out again efficiently for larger qubit numbers N. 

Though this algorithm is not proven to attain the true global optimum it is still 
a straightforward technique to obtain good settings. In the end this is often sufficient, 
recalling that the true unknown state can deviate from the target state and that one 
considers "just" an overall single figure of merit. For our simulations we always use the 
optimized settings for the totally mixed state. 

Regarding this point we finally like to add that if one does not employ the minimal 
number of measurement settings, but rather an over-complete set, e.g., four times as 
much settings but four times fewer measurements per setting, then the procedure is quite 
insensitive to the chosen measurement directions. Hence, in many practical situations 
the search for optimal directions might not even be necessary and randomly chosen 
measurement directions suffice equally well. 

5. 3. 2. Statistical pretest Via the pretest one estimates the fidelity between the true p true 



and the best permutationally invariant state -Fpi(ptrue) = m ax ppi > tr ( J ^/ptm^ppi v /ptruc ) 
using only measurement results from a few settings a G T, e.g., employing only 
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T = {ei,e2, £3}. Depending on this quantity one decides whether permutationally 
invariant tomography is worth to continue. As explained in detail in Ref. [J5] this 
fidelity can be bounded by 

PpiUrue) > [tr( Ptmc Z)] 2 (58) 

with an operator Z = Y2 z k-^-k being built up by the performed measurements 
given by Eq. (j2J) and satisfying Z < P sym , where P sym denotes the projector onto the 
symmetric subspace. 

The expansion coefficients z% should be optimized to attain the best lower bound. 
For a given target state p tar this problem can be cast into a semidefinite program [33J 03] 
that can be solved efficiently using standard numerical routines. However for larger qubit 
numbers one must again employ the block structure of the measurement operators as 
given by Eq. Q to handle the operator inequality. Note that the projector on the 
symmetric subspace has a Block structure P sym j = Sj^^t- Then the final problem 
reads as 

max 4tr(p tar M fc a ) (59) 

a€T,fc 

s.t. 4Mi 3=N/2 < 1, z kKj ^ °w < N / 2 - 

a£T,k a£T,k 

If one experimentally implements this pretest one must account for additional 
statistical fluctuations. For the chosen Z one can employ the sample mean Z = ^ z%f% 
using the observed frequencies = n^./iV R in N R repetitions of setting a, as an estimate 
of the true expectation value tr(p true Z). This sample mean Z will fluctuate around the 
true mean but large deviations will become less likely, such that for an appropriately 
chosen e the quantity sign(Z— e)(Z— e) 2 is a lower bound to the true fidelity at the desired 
confidence level. The proof essentially uses the techniques employed in Refs. [13 H6] . 

Proposition 5.3 (Statistical pretest). For any Z = ^2 z%M£ < P sym let Z = 
5]z^/JVr denote the sample mean using Nn repetitions for setting a G T. If the 
data are generated by the state p truc then 

Prob [F PI (p truc ) > sign(Z - e)(Z - e) 2 ] > 1 - exp(-2iV Re 2 /C ,2 ) (60) 

with C 2 = (^^ ax — 2^ in ) 2 where <2n iax / min are ^ e respective optima for setting a over 
all outcomes k. 

Proof. The given statement follows along 

Prob [F PI (p truc ) > sign(Z - e){Z - e) 2 ] > Prob [tr( Arue Z) > Z - e] (61) 
> 1 - Prob [tr(p truc Z) <Z-e]>\- exp(-2iV R e 2 /C , 2 2 ). (62) 

Here the first inequality holds because the set of outcomes satisfying {nl : tr(p trne Z) > 
Z — e} is a subset of {n% : -Fpi(ptrue) > sign(Z — e)[Z — e) 2 } using Eq. (1581 . In the last 
inequality we use Hoeffdings tail inequality jU] to bound Prob [tr(p true Z) < Z — e] . □ 
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Note that this pretest can also be applied after the whole tomography scheme in 
which case the projector P sym = ^ z^M^ becomes accessible. Moreover, let us point out 
that a strong statistical significance, or a low e respectively, might not be achieved with 
the best expectation value as given by Eq. ( 1591) [38] ; hence optimizing Z for a rather 
mixed state is often better. 

Finally let us remark that the pretest can be improved by additional projectors, see 
supplementary material of Ref. [T5]. This leads to the bound F P i(p tIue ) > J2jP"j with 
Pj being the weight of the corresponding spin-j state of the permutationally invariant 
part of p true as given in Eq. (171). From this expression one sees that this test only works 
well for states having a rather large weight on one of these spin states. Others, like 
the totally mixed state, while clearly being permutationally invariant, are not identified 
as states close to the permutationally invariant subspace. This is in stark contrast to 
compressed sensing where the certificate succeeds for the whole class of low rank target 
states [TU] . 

5.3.3. Entanglement and MaxLik-MaxEnt principle Following the last comment from 
the previous section, we want to argue that even in the case of an inefficient certificate, 
permutationally invariant state reconstruction as given by Eq. (jSJ) is meaningful. At 
first we would like to emphasize that the permutationally invariant part of any density 
operator represents a fair representative to investigate the entanglement properties of the 
true, unknown state. This is because the transformation given Eq. (TjQ) can be achieved 
by local operations and classical communications, whereby the amount of entanglement 
cannot increase. Thus if the permutationally invariant part of the density operator is 
entangled this holds true also for the real state. 

Second, permutationally invariant state reconstruction also represents the solution 
of the maximum-likelihood maximum-entropy principle as introduced in Ref. [49] . which 
goes as follows: If the performed measurements are not tomographically complete then 
there is, in general, not a single state p m \ that maximizes the likelihood but rather 
a complete set of them. In order to single out a "good" representative, Ref. [4"9] 
proposes to choose the state which has the largest entropy, which, according to the 
Jaynes principle |50j, is the special state for which one has the fewest information. 

Proposition 5.4 (Permutationally invariant MaxLik-MaxEnt principle). Using the de- 
scribed permutationally invariant tomography scheme, the reconstructed permutationally 
invariant state given by Eq. (TJP (with the likelihood function) is also the solution of the 
maximum-likelihood maximum- entropy principle. 

Proof. Since the measurements given by Eq. §2§ are invariant and tomographically 
complete for permutationally invariant states, all density operators with the same spin 
ensemble as /Spi have the same maximum likelihood. According to Ref. [50] the state 
with maximal entropy and consistent with a given set of expectation values for operators 
M£ has the form p oc exp(J^ a k \^M%). The Lagrange multipliers A£ G K, must be 
chosen such that the given expectation values match. However, because all M% are 
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permutationally invariant we can employ the block decomposition given by Eq. fl9]) and 
finally obtain exp(£ a , fc AgMjf) = exp(ffi i J2 a , k X k M k,j ® 1) = ©i ex P(Efc K M k,j) ® 1- 
Hence we obtain the same structure as /Spi, which therefore is also the state with 
maximum entropy. □ 

6. Conclusion and outline 

In this manuscript we provided all necessary ingredients to carry out permutationally 
invariant tomography [15] in experiments with large qubit numbers. This includes, 
besides scheme specific tasks like the statistical pretest and the optimization of the 
measurement settings, in particular the state reconstruction part. Accounting for 
statistical fluctuations due to a finite amount of data, this reconstruction demands 
the solution of a non-linear large-scale optimization problem. We achieve this by first 
using a convenient toolbox to store, characterize and process permutationally invariant 
states, which largely reduces the dimension of the underlying problem and second by 
using convex optimization, which is superior compared to commonly used numerical 
routines in many respects. This makes permutationally invariant tomography a complete 
tomography method requiring only moderate measurement and data analysis effort. 

There are many questions one may pursue in this direction: First, let us stress that 
the current prototype implementation is still not optimal. As explained, the bottleneck 
is the computation of the second derivatives, hence we strongly believe that Hessian 
free-optimization, like quasi- Newton or conjugate gradients [51], or the use of other 
barrier functions more tailored to linear matrix inequalities [52] are likely to push the 
reconstruction limit further. Second, it is natural to try to exploit other symmetries in 
the development of "symmetry" tomography protocols, i.e., tomography should work 
for all states that remain invariant under the action of a specific group. Clearly any 
symmetry decreases the number of state dependent parameters, but the challenge is 
to devise efficient local measurement strategies. Interesting classes here are graph- 
diagonal [53] or, more general, locally maximally entangleable states [51], translation 
or shift-invariant states [55], or U® N invariant states [561 EZ]- Third, it is worth to 
investigate to what extent particularly designed state tomography protocols are useful in 
further tasks, like process tomography for quantum gates. For instance, permutationally 
invariant tomography might be unable to resolve the Toffoli gate [58] directly, but since 
the operation on all N target qubits is symmetric, a permutationally invariant resolution 
of this subspace (and the additional control qubit) might be sufficient. Finally let us 
point out that permutationally invariant tomography can be further restricted to the 
symmetric subspace, which often contains the desired states. This is reasonable since 
we have seen that the pretest is only good if the unknown state has a large weight in 
one of the spin states. However since the symmetric subspace grows only linearly with 
the number of particles, this tomography scheme can analyse even much more qubits 
efficiently 
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